Journal of Theoretical and Applied Information Technology > 
15" April 2024. Vol.102. No 7 ~~ 
© Little Lion Scientific 


2. = 
wrviaa 


ISSN: 1992-8645 E-ISSN: 1817-3195 


SMART CROP PREDICTICTION USING STATISTICAL 
TECHNIQUES OF MACHINE LEARNING THROUGH IoT 


PRABHAT KUMAR SAHU!, SANGAM MALLA', MITRABINDA KHUNTIA!, SMITA RATH"’, 
MANJUSHREE NAYAK ? 


'Department of Computer Science and Engineering, Siksha‘O’ Anusandhan Deemed to be University, 
Bhubaneswar, Odisha, India 
? Department of Computer Science and Engineering, NIST , Berhampur, Odisha, India 


E-mail: '!prabhatsahu@soa.ac.in, *sangam.malla2015@gmail.com, *mitrabindakhuntia@soa.ac.in, 
4smitarath@soa.ac.in, °>drmanjushreemishra@ gmail.com 


Abstract 


The paper highlights the significance of agronomy in developing countries and the challenges faced in 
traditional farming methods, which rely heavily on human intervention. The solution proposed involves 
leveraging automation in agriculture, specifically using Internet of Things (IoT) sensors. By employing 
regular sensing and examination of crops through IoT sensors, combined with Machine Learning and 
statistical techniques, the system aims to predict the appropriate crop for an area that depends on factors 
like moisture in the soil, temperature, and humidity. The research emphasizes the need for such technology, 
particularly in countries like India, where agriculture is a predominant occupation. The system is designed 
to address issues such as repeated cultivation of the same crops and indiscriminate fertilizer use, which 
negatively impact crop yield and soil health. Ultimately, the proposed system intends to offer farmers 
insights into the best-suited crops for their land, along with information on required fertilizers and seeds, 
aiming to enhance profitability, encourage crop diversification, and mitigate soil pollution. 


Keywords:- Smart Agronomy, Soil Fertility, Machine Learning, Statistical Techniques, Smart Crop 
Prediction. 
1. INTRODUCTION e In automatic irrigation, farmers will have the 


knowledge as well as the provision to supply 
Agriculture is a very important occupation for a required amount of water in required time [7]. 
large part of the Indian population and it also e This method saves more amount of energy 
provides a blockish contribution to the Indian andtexources: 


economy. Keeping in view the facts of the past e It is easy to predict the type of crop suitable 


years, it has been seen that the quantity of grain 
yield in Agriculture in India has been reduced and 
the consequence is that the price level of the crop 
is simultaneously increased [1,2]. 

If seen, there are many different regions due to 
which the production of corn has decreased such 
as climate change, disease, fertilizer abuse, low 
soil fertility, water wastage, etc. Keeping this 
growing concern in mind, we should make 
changes in agriculture, and the ultimate solution of 
this problem is to combine those wireless sensors 
to IoT [3,4]. Internet of Things is among the one 
that keeps everything connected on the internet. Its 
main purpose is that it provides the right fact to the 
right person at the right time [5, 30]. Water for 
irrigation is a key factor in agriculture, file timing 
and magnitude of season-wise rain cannot be 
predicted in advance [6, 31]. 

Need of Prediction and automatic irrigation: 


for that region. 
e This method automatically controls soil 
moisture level thus decreasing the errors 
conducted by the humans [10, 23]. 
e In greenhouse, we can easily predict the type 
of vegetable that can be yielded using moisture, 
temperature and soil fertility checking sensors. 
e The motors are used in agriculture 
automatically by using controls and no need for 
labor to turn the motor on or off manually [11]. 

e Italso reduces runoff from over watering and 
over fertilizing saturated soils which will improve 
crop productivity [12,22]. 

e It also informs the exact amount of pesticide 

required for a particular type of crop. 

This mission makes use of IoT generation in 
agriculture; amazing vegetation increases 
environmental parameters in a set location to assist 
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farmers to find the problem in time [13, 14, 21]. 


statistics to upswing the farmers’ earnings and 
assist them with the precaution and management 
of the harvest disease and pests [15, 16, 25]. With 
the developments of different apps, different 
merchandise advertises agricultural generation 
along with valuable FAQ in online mode from the 
professionals [24, 26]. We can bring machine 
improvements by imposing improvements in 
Server, Android, Pc’s and its extensibility. 


2. LITERATURE SURVEY 


The information is collected and analyzed by the 
specialists who bring a comparison between the 
traditional work productivity and atmospheric 
impact. They are targeting crop observation; data 
of temperatures and the precipitation that will be 
collected and analyzed for initiating different 
methods to scale the failures of crops and to 
improve the plant production [17, 27]. The basic 
surveillance on the automated irrigation system 
helps to watch the crop fields and send appropriate 
data to the control system [18, 29].The data 
received or being collected from the wireless 
sensors are then forwarded to the server database 
by wireless transmission network[19,20]. During 
the process of automatic irrigation, if the 
temperature falls or any circumstances will arise 
which will break the barrier of the potential range 
specified by the user, then it will monitor it, 
inform the user about it and it will provide an 
interface for the user in order to take necessary 
steps in order to protect the crops or to give proper 
nutrition for the proper growth of the crops [30]. 


utilization, unpredictable weather patterns, and 
the need for increased productivity to meet the 
growing global demand for food. In addressing 
these challenges, the integration of Internet of 
Things (IoT) technologies into agricultural 
practices, commonly known as Smart Agriculture, 
has emerged as a promising solution. However, 
several critical issues need to be addressed to 
ensure the successful implementation and 
widespread adoption of Smart Agriculture. 

The initial investment required to deploy IoT 
infrastructure in agriculture, including sensors, 
actuators, and communication networks, can be 
prohibitive for small and medium-sized farmers. 
Cost-effective solutions are needed to make Smart 
Agriculture accessible to a broader range of 
agricultural practitioners. 

Many farmers may not have the necessary 
technical skills to effectively operate and maintain 
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Agriculture professionals deliver pointers with 


> Prof. K.A.Patil and Prof.N.R.Kalehad already 
initiated a new model of irrigation by smarter 
way through ICT. The entire period and the 
historical atmosphere are anticipated to assist 
to attain economical management and 
exploitation of recourses [1]. 

> Mahammad ShareefMekala, 
Dr.P.Viswanathandetermined some _ typical 
use of Agriculture IoT detector to Monitor 
Network Technologies applying cloud 
computing as a backbone [8, 32, 33]. 

> Prathibha S.R, AnupamaHongal,Jyothi 
M.P.had created temperature recording 
sensors which will record the agricultural 
field temperature, capture the situation 
through a camera and send it to the user for 
necessary action [3, 34, 35]. 

> International Journal of Engineering Science 
Research Technology implements some 
Machine Learning Techniques which predicts 
the type of crop productivity in Tamil Nadu. 
By using Random Forest algorithm, this 
paper focuses on the existing data of 
predicting the yield of the crop. The samples 
along with the models were tested by 
collecting appropriate data from Tamil Nadu 
Data centers. The accurate crop yield 
prediction could be used for Random Forest 
Algorithm [9]. 


3. PROBLEM STATEMENT 


Modern agriculture faces numerous challenges 
such as inefficient resource 


IoT devices. Training programs and user-friendly 
interfaces are needed to empower farmers with 
the knowledge and skills required to leverage 
Smart Agriculture technologies. 

Different crops, climates, and farming practices 
require tailored solutions. Creating a flexible 
framework that allows customization to suit the 
unique needs of diverse agricultural settings is 
essential for the widespread adoption and success 
of Smart Agriculture. 

Machine learning (ML) ~~ models have 
demonstrated remarkable success in various 
domains, ranging from healthcare to finance, yet 
the deployment and optimization of these models 
present a set of challenges that need to be 
addressed for widespread and effective utilization. 
Machine learning models heavily rely on data 
quality, and biases present in training data can 
lead to biased predictions. Ensuring data quality, 
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addressing bias, and promoting fairness in models 
are critical concerns, especially in applications 
that impact individuals or societal groups. 

As machine learning models become more 
complex, deploying them at scale presents 
challenges related to computational resources, 
memory requirements, and real-time processing. 
Optimizing models for efficiency while 
maintaining high accuracy is a delicate balance, 
particularly in resource-constrained environments. 
Addressing these challenges is essential for 
realizing the full potential of machine learning 
models across diverse applications. Overcoming 
these obstacles will contribute to the development 
of more transparent, reliable, and ethical machine 
learning solutions that can be _ effectively 
integrated into real-world scenarios. 


3. PROPOSED WORK 


The proposed system aims to predict the optimal 
crop for a specific piece of land by considering key 
factors like contents present in the soil and 
parameters affecting the weather. The block 
diagram is reflected in Figure 1 and data flow 
diagram is reflected in Figure 05. 

A. Sensor Data Collection: - Rainfall detection 
sensor will record the temperature, dampness 
and the humidity of the agricultural field and 
send the data to the Arduino Uno Control 
system for further action. 

B. Wireless data transfer: -Wi-Fi model is 
used to send the recorded data to the web 
server using any wireless networks. 

C. Data handling and Decision-making: - The 
whole system is dependent on what data is to 
be recorded by the sensor. After getting 
appropriate data from the sensors, the values 
will be checked in the program coding and if 
any abnormality is found then the control 
system instructs to either switch ON or 
switch OFF the motor accordingly. 

D. Automation and Irrigation system: - In 
order to implement the automated irrigation 
system, different methods are adapted by the 
system. It includes a number of relays, 
control system, web servers which work 
together to take necessary steps depending on 
sensor's data. 

E. Web Application: -The provision of 
designing a web application is necessary as it 
will help the users to retrieve the data 
recorded by the sensors and allow the user to 
take vital steps if necessary through the user- 
friendly application. 
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Mobile Application: - As this is the era of 
smart phone users, so mobile application 
benefited with web server accessibility 
through any wireless communication will 
definitely help the user to take vital steps at 
any time irrespective of any place. 
Machine’ Learning Algorithm:- The 
utilization of machine learning algorithms are 
based upon prediction, emphasizing the 
optimized estimation of likely outcomes 
based on trained data. Predictive analytics, 
incorporating data, statistical algorithms, and 
machine learning, is described as a means to 
identify the likelihood of future outcomes by 
analyzing historical data. The system in 
question employs supervised machine 
learning algorithms, specifically focusing on 
subcategories like classification and 
regression. In this context, the classification 
algorithm, particularly the Decision Tree 
algorithm, is deemed most suitable for 
predicting crops, while the Support Vector 
Machine (SVM) algorithm is specified for 
rainfall prediction within the system. 
Prediction of Rainfall:- The process begins 
with loading an external dataset containing 
previous year rainfall data. Subsequently, the 
dataset undergoes pre-processing, as outlined 
in the Data Pre-processing section. Following 
this pre-processing step, the model is trained 
using an SVM classifier with a Radial Basis 
Function (RBF) kernel. The classifier is then 
fitted to the training set, and the Radial Basis 
Function is expressed mathematically, as 
indicated by equation (1). 

V(P1,P2)= Exponent (-y ||p1-p2\|’) ....... (1) 

Where, 
|p 1-p2| are defined as the distance described 
by Eucliden between P1 and P2. 
y- is termed as Gamma 

After fitting and testing the model, it is used 
to predict the annual rainfall. The predicted 
rainfall serves as one of the input parameters 
for the crop prediction system. 

Crop Prediction:- The crop _ prediction 
process begins by loading external crop 
datasets, followed by various stages of pre- 
processing detailed in the Data Pre- 
processing section. After completing data 
pre-processing, the models are trained using a 
Decision Tree classifier on the training set. 
To predict the crop, factors like temperature, 
humidity, soil pH, and predicted rainfall are 
considered as input parameters for the 
system. These parameters can be manually 
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entered or obtained from sensors. The input 
values, including predicted rainfall, are then 
appended to a list. The Decision Tree 
algorithm utilizes this list data to predict the 
crop. 
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required fertilizers such as Nitrogen (N), 
Phosphorus (P), and Potassium (K) in 
kilograms per hectare, along with the 
necessary seed quantity in kilograms per acre 
for the recommended crop. Furthermore, the 


J. Crop Recommendation:- The system system includes features like displaying 
recommends the most suitable crop for current market prices and approximated yield 
cultivation based on predicted rainfall, soil in quintals per acre for the recommended 
contents, and weather parameters. crop. These details aim to assist farmers in 
Additionally, it provides information on selecting the most profitable crop 

K. 
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Figure 1: Block Diagram Of The Proposed Work 


3.1 Hardware Used sensor as shown in Figure 2. The sensors that 
will be used can be either analog or Digital. We 
use digital sensor in order to receive static output 


and analog sensors to receive threshold output. 


e Soil Humidity Sensor: - The sensors that 
are used to check the humidity of the soil of 
agricultural field are termed as Soil Humidity 
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Figure 2:Soil Humidity Sensor 


e Temperature Sensor: We have used LM35 
sensor as shown in Figure 3 whose output 
voltage is 5V and has a large operational range. 
Its rise in temperature is directly proportional to 
output voltage in mV. The reading capability of 
the sensor ranges from -55 degrees to 150 
degrees consuming minimum electricity. 

e Humidity Sensor: As because of its micro 
size with power consumption considerably low 
and signal throughput of about 20 meters, this 
DHT11 humidity sensor is the best choice for 
various applications. It has a complex of 
humidity sensors with a digitally calibrated 
signal output to ensure versatility and stability. 
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Figure 3: Temperature Sensor 


e Rain Detection Sensor: This sensor contains a 
resistance-type moisture measurement 
component that falls through the raining board 
and also to measure the rain intensity and 
connected to a power supply which may be of 
5V, the LED glows when the induction board 
has no raindrops and the Do output is high. When 
a small amount of water drips, the Do output is 
low, the switch indicates lights up, brushes off 
the water droplets, and generates a high level 
when it is reset to the initial state. Wi-Fi module 
facilitated for appending Wi-Fi functionality 
through serial UART communication, features 
include 802.11 protocol 


Figure 4:Rain Detection Sensor 
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Figure 5: Data Flow Diagram 


4. METHODOLOGY dataset and predict the most appropriate crop that 
can be yield at that particular area. Generally, the 
amount of rainfall and temperature affects the 
climate which in turn also affects the productions 
of crops in a particular region. The different type 
of pesticides and the quality of soil also affects the 
environmental factors for the growth of the crop in 
that region. Similarly, our research will give a 
based on machine learning have been used in our proper report about the type of crop in that 
research work. The accurate predictions of crop particular area in India every year. The dataset is 
yielding have been successfully achieved through reflected in Table 1. 

the comparison of these algorithms. 


According to our research study, we use the 
different sensors associated with IoT along with 
the approaches like machine learning to predict the 
yielding of crops in different regions. We have 
used the data collected from our different IoT 
sensors in our research work. SVM algorithms 


4.2 Preprocessing 
4.1 Dataset 


Data cleaning is the first task of the researcher 
after gathering all the related data which we have 
collected from our sensors. After completing the 
data cleaning process, the researcher finds out all 
the common columns from the data and then 


The dataset is completely based on the records 
collected from different sensors of IoT. These 
datasets are then fed to different ML algorithms. 
The ML algorithm will evaluate the records or the 


nme em IN 
3136 


Journal of Theoretical and Applied Information Technology > 
15" April 2024. Vol.102. No 7 ~~ 


© Little Lion Scientific 


ISSN: 1992-8645 


merges it into related data frames. Therefore, in 
order to maintain a general standard for every 
attributes the researcher takes the help of 
normalization process. Thus the final data frame 
includes all the feature like type of crop, origin 
country, year of cultivation, yielding value, 
average rainfall in mm, pesticides used and 
temperature recorded. Different algorithms that 
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rely upon machine learning predict the cultivation 
of crops and for practicing the best results a brief 
comparison on algorithms have been done. For 
predicting the best crop yielding we have used as 
well as compared the following models based on 
machine learning. The final dataset used is shown 
in fig.01 and the co-relation matrix is shown in 
fig.06. 


Table 1: Sample Of Final Dataset 


Temperature | Humidity | ph__| Rainfall__| Label_| 


Correlation 


Figure 6: Co-Relation 


pts 


Matrix After One 


Hot Encoding 


e SVC: 

The machine learning classification problems 
are minimized by SVM's. The model that is 
used to implement SVM!'s is called as 
Support Vector Classifier (SVC). It finds outs 
the amount of error that can be accepted in 
our model and deliver a hyper plane to fit the 
data. In our research SVC proves best to 
calculate and implement SVM. 


By combining all the four different data sets 
collected from different sensors, the final data 
frame with all required features is obtained. In 
order to reveal the relationship among different 
features, a correlation matrix is developed. This co 
relation matrix is considered as heat map from the 
above correlation as shown in fig-06. 

Thus the fig.02 clearly shows that the variables 
have neither relationship nor correlation among 
themselves. Therefore, each and every feature is 


independent. Considering our data frame, the two 
columns, product and origin countries contain the 
categorical data values that have label value 
instead of numeric value. There are many 
algorithms based on machine learning which 
cannot work on the models that are directly 
dependent on labeled data, rather they use 
variables in both input and output stream which 
can work on numeric values. Therefore, the 
categorical data needs to be converted to numeric 
values in order to convert them we need a definite 
encoding process. Using this encoding process a 
dedicated form is obtained from categorical data 
and feed to different algorithms following machine 
learning for obtaining better predictions. Our data 
frames that contained products and origin country 
columns having categorical data converted to 
numeric array through definite encoding process. 
Execution of this encoding process returns a 
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matrix by creating a binary column under every 
category. 

After successful execution of data, the processed 
data is categorized into training and testing 
carrying split factor of 0.7. This in term means that 
we will use 70% of data from the dataset for 
training purpose and 30% from the same 
remaining dataset for testing purpose. The 70% 
training data set is considered as the primary data 
set which will be used as input to machine learning 
algorithms for predicting the exact production 
value. On the other hand, the 30% test data will be 
used to examine the accuracy level when the 
training data is given as input to machine 
algorithms. 


5. EXPERIMENT AND RESULTS 
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account factors such as _ annual rainfall, 
temperature, humidity, and soil pH. The system 
autonomously predicts annual rainfall using the 
SVM and RFC algorithm based on past year data, 
while the user needs to input other parameters. In 
the results section, the system presents information 
on the recommended crop, necessary seeds per 
acre, market price, and an estimated yield for the 
suggested crop. Furthermore, the system considers 
NPK values in the input section to provide details 
on the required Nitrogen, Phosphorus, and 
Potassium for the recommended crop. The 
Confusion Matrix generated from SVM is shown 
in fig. 07. The ROC Curve produced through SVM 
is shown in fig.08 and the class prediction error is 
found in fig 09. 

The comparison of classification report of SVM is 


shown in the following Table 2 
The suggested system advises on the most fitting 


crop for a specific piece of land, taking into 


Table 2: Classification Report Of SVM Classifier 
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Fig. 07 Svm Confusing Matrix 
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Fig.08:- ROC Curve Of SVM 
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Fig.09:- Class Prediction Error For SVM 


We took the help of Rooted Square value in 
order to compare the above mentioned models. 
The coefficient of determination 1.e. R42 which 
can also be referred as Regression Score function 
is taken as input in order to calculate the 
evolution matrix of all the models mentioned 
above in our research work. The coefficient can 


be used to find out the variance proportion for 
product column. The positioning of data points 
either in curve or in line is decided by R“2 score. 
After comparing all the values and getting the 
result as shown in fig.10, we conclude that the 
highest R‘’2 score is 94% from SVM technique. 


Fig.10- Result Shown In LED Screen After Sensor Detection 


MODEL RESULTS AND CONCLUSION 


Generally, which data model fits perfectly the 
observed data set is decided by the R- Square 
interpretation. Considering an example, we can 
conclude that 94% of the data gets fitted perfectly 
well by Regression model if the value of R-Square 


interpretation is 94%. The model results are 
displayed in fig.10. 


The R squared value is directly proportional to the 
data set fitting which means the greater the value 
of R-Square model, the best is the fitting of the 
data set values. Thus considering the above 
criteria, it can be definitely said that in our 
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research work, SVM perfectly fits the data set 
values as it returns a highest value of 94%. 


In this case we can try for the node probability 
which can be calculates as the total number of 
products that reaches the node by the cardinality of 
the product samples. Thus the feature importance 
rises with the rise in value of the node probability. 
The node feature value [19, 20] is considered as 
the best value recorded and the value recorded in 
our research work is [15, 17, 18] through SVM. 


6. CONCLUSION: 


Automated Irrigation system under IoT is 
described in this article. The smart irrigation 
system comprises of different loT hardware, along 
with some control systems, web servers, and cloud 
communications. This system will automatically 
record all the environmental parameters and send 
them to store in cloud storage using wireless 
communications. The user will take control of the 
actions depending on whether it is done using the 
actuator. This advantage allows the farmer to 
improve the crop as well as predict the type of 
crop suitable for that particular region. The aim is 
to empower farmers to make informed decisions, 
fostering agricultural development through 
innovative ideas. The aim is to empower farmers 
to make informed decisions, fostering agricultural 
development through innovative ideas. 
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